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One hypothesis to explain how mutations in the same nuclear envelope proteins yield pathologies focused in distinct 
tissues is that as yet unidentified tissue-specific partners mediate the disease pathologies. The nuclear envelope 
proteome was recently determined from leukocytes and muscle. Here the same methodology is applied to liver and a 
direct comparison of the liver, muscle and leukocyte data sets is presented. At least 74 novel transmembrane proteins 
identified in these studies have been directly confirmed at the nuclear envelope. Within this set, RT-PCR, western blot 
and staining of tissue cryosections confirms that the protein complement of the nuclear envelope is clearly distinct from 
one tissue to another. Bioinformatics reveals similar divergence between tissues across the larger data sets. For proteins 
acting in complexes according to interactome data, the whole complex often exhibited the same tissue-specificity. Other 
tissue-specific nuclear envelope proteins identified were known proteins with functions in signaling and gene regulation. 
The high tissue specificity in the nuclear envelope likely underlies the complex disease pathologies and argues that all 
organelle proteomes warrant re-examination in multiple tissues. 



Introduction 

The nuclear envelope (NE) is a double membrane system consist- 
ing of the nuclear lamina, inner and outer nuclear membranes 
and nuclear pore complexes.' Though historically viewed as little 
more than a barrier and gatekeeper, recent years have linked NE 
proteins to functions as disparate as DNA damage repair,^ cell 
cycle regulation,^ and cell mobility.'' This range of functions 
is enabled because NE transmembrane proteins (NETs) in the 
outer membrane connect to the cytoskeleton and NETs/ lamins 
in the inner membrane interact with chromatin and gene regula- 
tory proteins. Mutations in lamins and NETs, often collectively 
referred to as the lamina, have been linked to distinct diseases 
that each exhibit tissue-specific pathologies ranging from mus- 
cular dystrophies to neuropathy, dermopathy, lipodystrophy, 
bone disorders and progeroid aging syndromes.' ' As the proteins 
mutated in these disorders are all widely expressed it has been 
proposed that as yet unidentified tissue-specific partners might 
mediate the tissue preferences in pathology. '•''■' 

Cellular organelles, in general, are thought to be relatively 
invariant in their integral protein composition, with the excep- 
tion of the complement of tissue-specific proteins being synthe- 
sized in the ER and functioning at the plasma membrane. The 
first indication that this may not be the case was a proteomic 
observation that mitochondria isolated from four different 
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mouse tissues exhibited differences in their protein complement.* 
Despite the significance of this finding, other organelles have not 
been similarly analyzed for such tissue specificity. Two recent 
proteomic determinations of blood leukocyte and muscle NE 
proteomes using identical methodologies each identified some 
novel proteins previously not reported at the NE.'''" Some differ- 
ences could be due to tissue- and cell-type specificity while others 
could reflect differences in the methodologies used from earlier 
studies. Here, we have employed the same methodology used for 
the blood leukocyte and muscle NEs to determine the liver NE 
proteome so that all three tissues could be directly compared. 

Comparing these NE proteomes, we find surprisingly few pro- 
teins common to all three tissues. Tissue differences determined 
by direct testing of a subset of confirmed NETs by antibody stain- 
ing of tissue cryosections, tissue western blot and tissue RT-PCR 
reflected the tissue differences indicated by the proteome data set 
comparison. Proteomic tissue specificity in the larger data sets 
also correlated with expression data from a large-scale transcrip- 
tome study. Furthermore, comparison of the proteome data with 
interactome data revealed that proteins indicated to be in com- 
plexes often segregated together into particular tissues. Among 
the subset of proteins with known functions identified here at 
the NE, gene ontology (GO) functional assignments suggest that 
the observed tissue differences in NE composition contribute to 
signaling and gene regulation. It is reasonable to speculate that 
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Figure 1. Liver NE preparations. (A) NEs were isolated from rat liver by 
first dounce homogenization of the tissue to release nuclei, followed by 
separation of many contaminating membranes on sucrose gradients 
and finally digestion and washing away chromatin. These were further 
extracted with salt and detergent or NaOH to enrich respectively 
for proteins associated with the insoluble lamin polymer or proteins 
embedded in the membrane. (B-D) Electron micrographs of NE prepa- 
rations. Arrowheads point to places where NPCs are inserted in the 
membrane. Note that images are taken at the step prior to extraction 
as no discernible structure was left after NaOH or salt and detergent 
extraction. Thus much cleaner NEs were subject to mass spectrometry 
analysis. Scale bar 200 nm. (B) Double membrane from particularly 
clean NE. (C) Most NEs had still chromatin connected. (D) Contaminants 
were also observed stuck to NEs such as the fragmented mitochon- 
dria shown. (E-F) NEs from the pre-extraction stage were analyzed 
by western blot for expected contaminants. (E) A microsome fraction 
was separately isolated from the post-nuclear supernatant and similar 



many among the hundreds of previously uncharacterized pro- 
teins identified in the NE will have similarly important tissue- 
specific contributions. 

Results 

Nuclear envelope proteomics. Many novel NE proteins were 
identified in two recent proteomic analyses of rat muscle and 
human peripheral blood leukocytes''" that were not identified 
in earlier studies of rat liver" or mouse neuronal tissue culture 
cells. This finding could indicate that the NE proteome differs 
significantly between tissues or could reflect a combination of 
moderate tissue differences together with improvements in the 
mass spectrometry approaches used and/or a tendency for dif- 
ferent contaminants to co-fractionate in a particular tissue. To 
distinguish these possibilities a new analysis of rat liver NEs was 
engaged that used the same approaches as the recent muscle and 
leukocyte studies. 

Nuclei were first isolated from other cellular organelles by 
floating contaminating membranes on sucrose, then chromatin 
was digested and nuclear contents extracted with salt washes and 
removed by floating on sucrose to generate crude NEs (Fig. lA). 
These were further purified by alkali or detergent extraction prior 
to mass spectrometry because isolated NEs at this stage clearly 
have some chromatin attached and some contamination from 
other cellular organelles as determined by electron microscopy 
(Fig. IB— D). Nonetheless, even at this stage, the fractionation 
had effectively separated the NEs from most expected contami- 
nants as judged by the fact that the ER marker calreticulin and 
the mitochondrial marker porin were undetectable by Western 
(Fig. IE and F). 

The reason that the ER would be expected to provide the 
principal transmembrane protein contaminants is because the 
ER membrane is continuous with the outer nuclear membrane." 
Therefore, some ER proteins will normally reside also in the 
outer nuclear membrane or, viewed conversely, some NETs likely 
double as ER proteins. This has already been demonstrated for 
the NET emerin, which concentrates both in the inner nuclear 
membrane and in the peripheral ER where it functions in con- 
necting the centrosome to the outer nuclear membrane.''' To 
better distinguish NE-specific proteins a separate microsomal 
membrane fraction'^ was prepared from the tissue for compari- 
son. NETs enriched in the NE data sets over the microsome 
data sets could be considered as higher probability candidate NE 
proteins, though other NETs could still be important because 
it is estimated that -40% of proteins occupy multiple cellular 
compartments."^ Thus NE proteins identified were considered in 
total or as enriched in NEs > 5-fold over microsomes based on 
normalized spectral counts.'^ '* 

These NEs and microsomes were then extracted with alkali or 
detergent treatments to remove many of these presumed contam- 
inants prior to mass spectrometry analysis (Fig. lA). One aliquot 
of NEs was extracted with 0.1 M NaOH because this breaks most 
protein-protein interactions without solubilizing membranes and 
so enriches for transmembrane proteins. Another aliquot was 
extracted with 500 mM NaCl/1% |3-octylglucoside because the 



amounts of total protein loaded. NE proteins lamin A/C and LAP2(J were 
not present in the microsome fraction while the ER marker calreticulin 
was highly enriched in the microsome fraction compared with the NE 
fraction. As the ONM is continuous with the ER, the low calreticulin 
signal in the NE fraction is expected. (F) A mitochondria fraction was 
isolated by pelleting at 11,000 x g from the post-nuclear supernatant. 
The mitochondrial marker porin was undetectable in NEs at the level of 
sensitivity of the LICOR for fluorescence detection, even when 20x more 
NEs were loaded. 
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detergent will draw the membrane lip- 
ids into micelles without perturbing the 
intermediate filament lamin polymer and 
so enriches for proteins tightly associated 
with the lamin polymer. Both aliquots 
were separately analyzed because some 
well-characterized NETs distribute to one 
or the other fraction." '- Each extracted 
fraction was digested with trypsin and 
soluble peptides were equally divided 
for use in 4 direct replicate multidimen- 
sional protein identification technology 
(MudPIT)"'^" runs. Remaining insoluble 
material was further digested with pro- 
teinase K at high pH and split for an addi- 
tional two runs (Fig. SI; Table SI). 

The use of both different extrac- 
tion and different digestion conditions 
increased the total number of NETs 
identified. The separate analysis of salt/ 
detergent extracted NEs resulted in the 
recovery of 34 additional transmembrane 
proteins over the NaOH extracted NEs (a 
6% increase) using the NE-enriched data 
set (Fig. 2A, top Venn diagram). A similar increase in identified 
NETs was obtained (5.6%) when including NETs also < 5-fold 
enriched over microsomes. The sequential digests (trypsin, then 
proteinase K) were engaged because the NE lamina is defined 
largely by its insolubility, consisting of both intermediate fila- 
ment and transmembrane proteins. Thus it was anticipated that 
associated proteins would be missed that were impervious to the 
trypsin digestion due to hydrophobic aggregation. This anticipa- 
tion was justified because 62 additional transmembrane proteins 
(an 11.5% increase) were uniquely identified in the proteinase K 
fraction and not in the trypsin fraction using the NE-enriched 
data set (Fig. 2A, bottom Venn diagram) . This approach would 
thus likely benefit other proteomic analyses of transmembrane 
proteins. 

Similarly, engaging multiple replicate runs enabled more com- 
prehensive identification of all proteins in the fractions. The total 
number of proteins identified from six salt/detergent extracted 
NE runs was nearly 50% higher than the number identified 
in any individual run (Fig. 2B). As run number increased the 
number of new identifications dropped, reaching a plateau by -5 
runs where it could be estimated that roughly all proteins in the 
fractions had been identified. Whereas the earlier liver study that 
engaged only one run for each extracted fraction identified 1,150 
proteins," this new liver study identified 2,921 proteins (Table 
S2). 

The finding that so many replicate runs were required for 
new identifications to reach a plateau is an important observa- 
tion because it indicates the need for many replicate mass spec- 
trometry runs when investigating complex fractions. The blood 
leukocyte and muscle studies had respectively engaged 5 and 
7 MudPIT runs using the same digestion and run methodol- 
9,10 jj^^j- three analyses should be fairly comprehensive 
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Figure 2. Comparison of replicate MudPIT runs. (A) Both the process of separately analyzing alkali 
and salt/ detergent extracted NEs and the sequential protease digestions increased the recovery 
of proteins, particularly NETs. Proportional venn diagrams are shown for the transmembrane 
proteins identified in all the NaOH or salt/detergent extracted samples and for those identified in 
all trypsin alone or trypsin followed by proteinase K runs. (B) The same complex sample equally 
divided yields differences in the identifications for each MudPIT run. However, as the number of 
runs increases fewer new unique proteins are identified such that the curve plateaus with roughly 
five replicates. In this experiment liver NEs extracted with 400 mM NaCI, 1% p-octylglucoside were 
digested with trypsin, insoluble material pelleted and digested with PK. The trypsin sample was 
split into four equal samples and the PK material into two equal samples and all six samples were 
separately run on the mass spectrometer. 



and could be compared on equal footing. In the case of the 
blood leukocyte data sets, 5 MudPIT runs were performed for 
unstimulated leukocytes and an additional 5 MudPIT runs were 
separately performed for PHA- activated leukocytes, finding some 
differences between the two states;' so for purposes of compari- 
son just the PHA-stimulated data sets are used while total NET 
tallies include both. Before comparing the new liver NE data sets 
with the blood leukocyte and muscle NE data sets, redundancy 
due to differences in annotation was removed by converting pro- 
tein IDs from all 30 MudPIT runs to orthologous gene groups. 
This yielded 5,222 proteins in total identified among unstimu- 
lated and activated leukocyte, muscle and liver NE fractions and 
1,037 NETs (Table S3). Proteins were ranked by abundance 
estimates based on normalized spectral counts'^''^ and compared 
with microsomes to generate the NE-enriched data set contain- 
ing 4,077 proteins and 598 NETs. 

Nuclear envelope tissue specificity. Considerable differences 
were observed in the total protein complement of the NE when 
comparing the three data sets. First, proteins were plotted as heat 
maps using a color coding based on the abundance estimate for 
a particular tissue using normalized spectral counts'^ across all 
tissues (Fig. 3A). Many proteins were uniquely identified in a 
single tissue. Furthermore, differences in abundance were often 
observed for those identified in multiple tissues. Though some of 
those identified in all tissues were highly abundant, surprisingly, 
quite a few of the tissue-specific NE proteins were more abun- 
dant than ones that were ubiquitously expressed. Second, tissue- 
specific NETs tended to be less conserved than those widely 
expressed. Analysis of the evolutionary conservation of NETs 
revealed a statistical correlation such that the more tissue-specific 
the NET the less it was conserved (Fig. 3B). The subset of tissue- 
specific proteins thus is more likely to have evolved more specific 
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Figure 3. Global analysis of NE tissue differences. (A) All proteins identified in all tissues are plot- 
ted in the heatmap with TM proteins on the left and soluble proteins on the right. Color-coding is 
for log-transformed dNSAF values, indicating the relative abundance within a particular tissue with 
red meaning high abundance and blue low abundance. Raw dNSAF values are given in Table S3. 
Black indicates absence from a particular data set. The PHA-activated blood leukocyte data set was 
used, but results are indistinguishable from the separate unstimulated blood leukocyte data set. The 
differences between tissues and the lack of a correlation between abundance in a tissue and overall 
abundance further support the tissue specificity. (B) The percent identity between mouse, rat and 
human homologs was calculated for all NE proteins and the distribution is plotted using variable width 
Tukey boxplots according to the number of tissues in which each protein was found. The proteins 
found in all three tissues were significantly more conserved in sequence than the proteins found in just 
one (Kolmogorov-Smirnov test: D = 0.1547, p value 4.0 x (C) NE proteins that were identified in at 
least 60% of runs compared with data from the BioGPS transcriptome database. Proteins identified in 
different tissues were color-coded: PHA-activated human blood leukocytes (red), rat muscle (yellow), 
rat liver (blue), PHA-activated blood leukocytes and muscle (orange), PHA-activated blood leukocytes 
and liver (purple), muscle and liver (green), all three tissues (brown). These were then plotted according 
to their level of expression in the different tissues such that those more specifically expressed in human 
blood, muscle or liver respectively climb along the x-, y- and z-axis. Note that the BioGPS transcriptome 
database did not have a separate leukocyte-enriched population similar to that used for the proteomic 
analysis; therefore, whole blood was used for the comparison because expression in this tissue should 
encompass all the proteins identified in the more restricted blood leukocyte NE data sets. 



functions. Third, tissue specificity of protein identifications cor- 
related with tissue-specific expression data. NE proteins identi- 
fied in the various tissues were checked for expression against the 
high-throughput BioGPS transcriptome database that compared 
gene expression levels between 80 different human tissues. 
Proteins color-coded by their proteomic identification in human 
blood leukocytes (red), rat muscle (yellow) or rat liver (blue) and 
the various combinations were plotted according to the tissue 
preference for their transcript expression in whole human blood 
(x-axis), human liver (y-axis) or human muscle (z-axis) (Fig. 3C). 
This yielded a clear correlation between mRNA tissue expression 
and protein identification in our tissue NE fractions across the 



wider set of proteins identified and 
underscores that many tissue differ- 
ences are conserved across species. 

The calculated percentage of 
NETs shared between the muscle, 
liver and blood leukocyte NEs was 
remarkably small — only 16% of the 
total NETs identified (Fig. 4A). Thus 
the vast majority of NETs identified 
are distinct in certain tissues. These 
tissue differences are not driven by 
species differences because the con- 
servation between the two rat tissues 
(liver and muscle: 31%) is similar to 
that between these and the human 
leukocytes (25% and 27%). Prep- 
to-prep variation should have been 
largely averaged out because, due to 
low yields in isolating clean leukocyte 
and muscle NEs, at least 12 individ- 
ual preparations were combined. A 
value for shared proteins around 10% 
was maintained whether consider- 
ing all proteins identified including 
soluble proteins or considering sub- 
sets that represent higher stringency 
criteria such as the NETs, those with 
spectral abundance in NEs > 5-fold 
higher than microsomes or proteins 
identified in multiple MudPIT runs 
(Fig. 4B). Nonetheless, this number 
should not be taken as an absolute 
value because of the inability to dis- 
tinguish all contaminants obtained 
during NE isolation. 

To estimate the level of possible 
contamination acquired during cell 
fractionation, the complete data sets 
(5,222 proteins) were searched for 
proteins with gene ontology (GO)- 
localization terms for other cellular 
organelles (particularly membrane 
bound organelles). The tissue distri- 
bution of this subset of NE proteins 
that have previously been linked to other organelles was the 
opposite of that for the whole data sets, with the largest fraction 
being shared by all three tissues for both the total protein set (not 
shown) and for 356 NETs (Fig. 4C). Thus, if these proteins were 
considered as contaminants it would only increase the amount of 
tissue specificity among the remaining proteins. 

If the tissue differences were due to a particular organelle more 
readily co-purifying with NEs in one tissue vs. another, then con- 
taminants from that organelle would be expected to accumulate 
in a particular tissue. This was not the case for any organelle 
(Fig. 4D). Instead the percentage of these potential contami- 
nants in each tissue was roughly equal for individual organelles, 
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whether considering those identified exclusively in one tis- 
sue, in any two tissues or in all three tissues. 

Proteins with GO-localization terms for any individual 
organelle generally represented only 0.5 to 3% of total 
proteins in any tissue data set with the exception of ER 
and mitochondria (Fig. 4D). This could indicate contami- 
nation due to a favored relationship between the NE and 
these organelles. Indeed, as noted before, the ER is con- 
tinuous with the NE'^ and also mitochondria are observed 
in invaginations of the NE.^^ However, the fact that only a 
fraction of the known proteins from these organelles were 
identified in the NE preparations is perhaps more consis- 
tent with this subset of proteins having functions in both 
organelles. Interestingly, a large number of these proteins 
were associated with multiple GO-localization terms, 
further consistent with their identification as valid NE 
components. 

If viewing those proteins enriched in the ER and mito- 
chondria as more likely to be contaminants, then the use 
of the enriched data set (with normalized spectral abun- 
dance in the NE data sets > 5-fold over the microsome 
data sets) should yield high confidence in identifications. 
Mitochondrial proteins** accounted for 3% of those in the 
NE data sets: subtracting those left 3,946 proteins in the 
mixed enriched data set of which 571 were putative NETs. 
Because not all NETs were exclusive to the NE, we refer to 
individual NETs using their gene names. 

Direct confirmation of NET tissue specificity. The 
only way to verify if a protein is a true NE component 
is through direct testing by microscopy for targeting to 
the NE. The validity of these data sets is supported by the 
confirmation either with tagged fusions or antibodies thus 
far of 87 NETs (Table S4). All of the original 13 NETs 
met the enrichment criterion of being > 5-fold enriched in 
the NE compared with microsomes and nearly 90% of the 
new NETs meeting this criterion that have been tested are 
now confirmed. Strikingly, over half of the 74 new NETs 
confirmed to at least partially accumulate at the NE were < 
5 -fold enriched compared with microsomal proteins based 
on normalized spectral counts — in fact 8 had very similar 
levels. Interestingly, 95% of the proteins tested in the < 
5-fold enriched set were validated as targeting to the NE. 
Thus many proteins in the data sets clearly occupy multi- 
ple cellular compartments, as expected from the continuity 
between the ER and outer nuclear membrane. 

Several proteins identified in only one of the extraction 
conditions were confirmed at the NE, further validating 
the methodology of using different extractions and sequen- 
tial digests to increase identification of membrane proteins. 
NET33/SCARA5 and Tmem70 were identified only in 
the proteinase K digested data sets while NET34/SLC39A14 
and NET62/MCAT were identified only in the salt/ detergent 
extracted data sets and another 16 proteins were identified only 
in the NaOH extracted/ trypsin digested data sets. 

Tissue specificity was directly tested for several NETs by 
investigating transcript and protein levels and antibody staining 
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Figure 4. NE tissue distinctiveness. (A) Numbers of NETs overlapping between 
rat liver, rat muscle and human PHA-stimulated blood leukocytes expressed us- 
ing a proportional Venn diagram. For this analysis PHA-stimulated blood leuko- 
cytes were considered separately from the leukocytes that were not stimulated 
(hence fewer than the 1,037 total NETs identified). This was done because pro- 
tein differences between the two conditions render them almost like separate 
tissues;' however, results were similar when comparing either blood leukocyte 
data set with the other tissues. (B) The small amount of overlap between tis- 
sues for the NE proteome is maintained for higher stringency protein subsets. 
Plotting the percentage of proteins shared between all three tissues yields a 
value -10% that is maintained when considering higher stringency subsets: 
NET, transmembrane proteins; 5x, 5-fold enriched over microsomes by abun- 
dance estimates (dNSAF); Multi, appeared in multiple MudPIT runs for the same 
tissue. (C) Proportional Venn diagram of transmembrane proteins found in NE 
data sets that had GO-targeting annotations associated with other organelles. 
As in A, just the PHA-stimulated blood leukocytes are shown for the compari- 
son. Many more proteins were identified in multiple tissues among this set 
compared with the total set of NETs shown in Figure 3A. (D) The percentage of 
total proteins in a particular NE tissue data set that have GO-targeting annota- 
tions for each organelle is plotted by each contaminating organelle (cyto-mbv 
is cytoplasmic membrane bound vesicles). Most organelles represented less 
than 3% contamination of any NE preparation. Moreover, the amount was 
similar for proteins identified in individual or multiple NE tissue data sets. Only 
mitochondria and ER GO-targeting annotated proteins were found more com- 
monly in all NE data sets. Standard deviations are shown. Note that the error 
bar for mitochondria contaminants in 2-tissues goes above the graph scale: the 
standard deviation value is 10.96. 



in different tissues. Transcript levels of 64 putative and con- 
firmed NETs were compared by RT-PCR in 8 human tissues 
(Fig. 5A; Fig. 52 and Table 54). Though some were ubiqui- 
tously expressed, many NET transcripts were only detected in 
a subset of the tissues examined. This tissue expression largely 
matched the tissue identification by mass spectrometry. Tissue 
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Figure 5. Experimental confirmation of NET 
tissue-specificity. (A) RT-PCR from human 
tissue RNA confirms that several of the NETs 
identified by MudPIT in a particular tissue have 
messages preferentially transcribed in that 
tissue and often absent from a subset of other 
tissues. NETs shovs/n are grouped according to 
the tissue of highest expression (muscle, liver 
or blood leukocytes), but some were found in 
multiple data sets. Peptidylprolyl isomerase 
A (PPIA) was used as a loading control. (B) 
Protein tissue specificity assessed by western 
blot. Equal amounts of protein from lysates 
of rat liver, heart, muscle and thymus were 
compared on the same blots for the levels of 
different NETs. NET levels were quantified us- 
ing fluorescently labeled secondary antibod- 
ies using a LI-COR system. The total signal for 
all four tissues was calculated and the fraction 
of the total signal in each tissue is colored in 
the plot. The average values from three blots 
are shown. (C) Cryosections of rat heart or leg 
muscle, liver and spleen were stained with 
NET antibodies. Nuclear rim staining was only 
observed in the tissue where the NET was 
identified by proteomics: C17orf62 was more 
abundant in blood leukocytes, but identi- 
fied in all tissues. Sometimes cytoplasmic 
staining was also observed, consistent with 
NETs occupying multiple cellular locations; 
however, most of the appearance of NETs in 
the cytoplasm in other tissues comes from 
increasing exposure times in order to see the 
background. Bars 10 jjim. 



differences at the protein level were confirmed by western blot 
with NET antibodies comparing rat liver, heart, leg muscle and 
thymus lysates (Fig. 5B). Equal amounts of each tissue were 
resolved on the same gels for western blotting and the signals 
for bands corresponding to each NET were directly quantified 
from fluorophores conjugated to the antibodies. The fraction 
for each tissue from the total signal from all tissues combined 
is plotted. 

Finally, antibodies generated to several of the confirmed NETs 
identified in liver, muscle and/ or leukocytes'''"'^* were tested on 



cryosections from rat muscle, liver and 
spleen (Fig. 5C). With the exception of 
C17orf62, which was identified in all three 
tissues, NE staining was only observed 
in the tissue from which the novel NET 
was identified. For example POPDC2 
and Tmem38A were both identified only 
in muscle and a rim staining around the 
nucleus as defined by DAPI staining for 
DNA was observed in the muscle cryosec- 
tions, but not in the liver or spleen cryosec- 
tions. To ensure that no weak nuclear rim 
staining was occurring, the exposure times 
were much longer in the tissues where 
rim staining was not observed giving the 
appearance of high background. Similarly nuclear rim staining 
is only observed in liver for DHRS7 and TM7SF2 that were 
identified only in the liver data sets and Tmeml26A that was 
identified uniquely in the leukocytes appeared at the nuclear rim 
only in spleen (Fig. 5C). Importantly, it can be observed that 
even within a tissue staining was restricted to only certain cell 
types. Tissues are made up of multiple cell types and, accord- 
ingly, Tmem38A strongly stains only 3 of the 6 nuclei in the 
image shown. The cryosection data importantly demonstrate 
that tissue-specificity applies to NE residence, even if some mod- 
erate expression is observed in another tissue. 
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Functional consequences of NE tissue differences. General 
protein characteristics such as isoelectric point, transmembrane 
topology and prevalence of coiled-coils did not vary between 
NETs identified in different tissues (data not shown). At the 
same time the previously suggested tendency for NETs to have 
higher isoelectric points^' was supported, suggesting that these 
data sets can be used to extract general characteristics of NETs. 

A further support of both the tissue distinctiveness of the NE 
and its functional importance is that the components of a particu- 
lar protein complex were often found together in one tissue while 
being absent from the other tissues. NE proteins from the dif- 
ferent tissue data sets were searched for their occurrence in com- 
plexes listed in the HPRD database at Johns Hopkins University. 
For each complex for which a component was found in the NE 
data sets the proportion of complex components identified in the 
NE of each tissue was plotted (Fig. 6A). Though few complexes 
were recovered in their entirety in all three tissues, in many cases 
a complex was fully identified in one tissue but not identified 
at all or only partially identified in other tissues. In these latter 
cases the complex might contain different components in differ- 
ent tissues. These same characteristics were observed when con- 
sidering only complexes containing transmembrane components 
(Fig. 6B). 

Another possible reason for NE tissue specificity is suggested 
by the subset of proteins identified at the NE that have Gene 
Ontology (GO)^'' functional annotations. To determine if partic- 
ular nuclear GO-functions were partly accumulating at the NE 
so that they become relatively enriched at the NE, the fraction 
of our NE proteins with a particular functional annotation was 
compared with the fraction of total GO-annotated nuclear pro- 
teins with the same functional annotations. Thus a positive value 
for this ratio indicates relative enrichment at the NE such that 
the percentage of NE proteins devoted to a particular function 
is greater than the percentage of GO-nuclear proteins devoted to 
that same function. Negative values indicate relative enrichment 
in the nucleoplasm (Fig. 6C— F). All GO-functional annotations 
used were experimentally verified. 

General functional categories yielded expected distributions, 
e.g., DNA and RNA functions were relatively enriched in the 
nucleoplasm while transport functions were relatively enriched 
at the NE (Fig. 6C). However, more specific functions varied 
in relative NE enrichment, some according to the tissues sam- 
pled. Although as expected general RNA functions were more 
nucleoplasmic, those proteins involved in splicing were relatively 
enriched at the NE (Fig. 6D). This does not mean that splic- 
ing preferentially occurs at the NE compared with the nucleo- 
plasm (it is clearly more nucleoplasmic by localization studies), 
but that among GO functions splicing occurs more often at 
the NE than other GO functions. The relative enrichment of 
some splicing functions at the NE is perhaps due to splicing 
factors that remain associated with mRNAs during transport 
through the nuclear pore complex. Other GO-functions had 
tissue-specific differences in their relative functional enrich- 
ment at the NE. Polyadenylation functions, for example, were 
relatively enriched at the NE only in liver and muscle, but not 
in leukocytes. 



Although most epigenetic regulatory functional groupings 
were not relatively enriched at the NE, certain silencing factors 
became relatively enriched at the NE in certain tissues. Polycomb 
group proteins were only relatively enriched at the NE in liver 
while NuRD was relatively enriched in all but unstimulated 
blood leukocytes (Fig. 6E). Similarly, several proteins involved 
in functional groupings for signaling pathways were also rela- 
tively enriched at the NE (Fig. 6F). Of note, a large difference 
was observed between unstimulated and PHA-stimulated blood 
leukocytes in cGMP-mediated signaling, Wnt receptor signaling 
was only relatively enriched in liver and BMP signaling was more 
relatively enriched in the nucleoplasm in blood leukocytes than 
in liver and muscle. 

Discussion 

The high degree of tissue specificity in the NE proteome 
observed here was not expected. However, it is not surprising 
in retrospect when considering that the three tissues compared 
are all made up of many different cell types that have striking 
differences in nuclear size, shape and the amount of dense chro- 
matin at the NE. For example, in addition to hepatocytes liver 
contains biliary epithelia, sinusoidal cells, Kupffer cells and 
stellate cells as well as connective tissue, veins and arteries and 
muscle composition is similarly diverse. Though the blood cells 
were determined to be roughly 70% leukocytes, there were also 
many other cell types present. Applying the same bioinformatic 
analysis to a previously published data set of mitochondria from 
different tissues** indicates that the NE has at least 3-fold more 
tissue specificity than mitochondria. However, even the differ- 
ences observed for mitochondria are large enough that these 
studies together strongly argue for evaluation of the protein 
complement of all organelles in different tissues. Furthermore 
the functional implications of this work underscore the impor- 
tance of considering the possibility of tissue-specific mediators 
when studying the function of most well-characterized proteins. 

As roughly a third of the exome is predicted to encode 
transmembrane proteins, it is important to develop improved 
methods for their detection in proteomic analyses. Most such 
approaches have focused on chemically improving resolution of 
membrane proteins on 2D gels, but it is generally assumed that 
LC/LC/MS/MS approaches avoid the losses inherent in 2D 
gels.^^'^** While this is certainly true to some extent, the greater 
than 10% increase in NET identifications we observe by using 
multiple extraction and digestion conditions strongly argue 
that proteomic studies still tend to under-represent transmem- 
brane proteins and provides a simple approach that can increase 
membrane protein identifications. Our results also underscore 
the importance of engaging multiple replicate runs for com- 
plex fractions. Though some of the differences between liver 
NE proteins identified in this study compared with a previous 
MudPIT analysis of liver NEs" might be attributed to improve- 
ments in peptide fragmentation and identification in the mass 
spectrometers used, within this study the replicate runs and 
sequential digests roughly doubled the number of identifica- 
tions. Thus the identification of 2,921 liver NE proteins in this 
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Figure 6. Differences in NE functional composition in different tissues. (A) Proteins identified in at least 60% of runs for a particular NE fraction were 
searched for their inclusion in the Johns Hopkins HPRD database of annotated protein complexes and then for each complex the proportion of 
complex components found in each tissue was calculated. Data are shown for the 352 complexes for which at least 75% of components were found 
in the data sets and the percentage for each tissue is plotted additively with liver in blue, muscle in yellow and blood leukocytes in red. The maxi- 
mum proportion for any individual complex is 1; therefore a complex fully intact in all three tissues would have a value of 3. For many complexes all 
components were found in only one or two of the tissues. (B) The percentage of complex components found in each tissue is similarly plotted, but 
restricted to those complexes containing a NET. (C-F) Within the subset of proteins in NE data sets with GO-annotations, the fraction with a particular 
functional annotation was calculated. Similar fractions were calculated against all "nuclear"-annotated proteins in the GO-database. The relative ratio 
of NE/nuclear fractions was then calculated, setting a 1 :1 ratio to 0 so that positive values are fold-relative enrichment and negative are fold-relative 
deficiency at the NE compared with the whole nucleus. Relative enrichment thus indicates a function that is more enriched at the NE compared with 
other functions associated with the nucleus as opposed to indicating a concentration at the NE. Liver, muscle and unstimulated and activated blood 
leukocyte data sets are represented by blue, yellow, pink and red bars, respectively. (C) General functions such as nuclear transport, signaling and ion 
transport were relatively enriched at the periphery. (D) More specific RNA functions revealed some tissue differences in relative NE enrichment. (E) 
Certain epigenetic functions were relatively enriched at the NE in particular tissues. (F) Some signaling functions were relatively enriched at the NE 
in certain tissues. For example, Wnt signaling was relatively enriched at the NE in liver while being strongly deficient at the NE in unstimulated blood 
leukocytes. 



study compared with 1,150 in the previous study is apparently 
mostly due to these two procedural changes since the purifica- 
tion procedures used to isolate NEs from liver were identical in 
both studies. 



The extreme sensitivity of mass spectrometry often yields 
identification of even minor contaminants in a sample. As cel- 
lular fractions are inherently impossible to purify to homogene- 
ity, this has led to a tendency to use cutoffs based on abundance 
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estimates as an effective in silica purification step. While this 
makes sense for most soluble proteins, that 95% of NETs tested 
from the < 5 -fold enriched set were confirmed as targeting at least 
in part to the NE argues that a separate, less stringent cutoff may 
be appropriate for transmembrane proteins. 

The reason that close to half of the total NETs identified were 
not > 5-fold enriched at the NE is likely because they have mul- 
tiple cellular localizations. This is consistent with separate high- 
throughput observations that 40% of all proteins have multiple 
cellular localizations"" as well as the physical structure of the NE 
in association with both mitochondria and ER."'^^ Thus it is not 
surprising that three tissue-specific proteins in the data sets previ- 
ously published as mitochondrial and ER proteins (MARCHV, 
Tmem/O and Tmem38A; refs. 29—31) were confirmed as NETs 
in the inner nuclear membrane by super-resolution microscopy.'''" 
Even the well characterized and NE-enriched NET emerin has 
now been found, in addition to its predominant inner nuclear 
membrane localization, also to associate with the centrosome 
in the outer nuclear membrane, the cytoplasm in myotubes and 
interstitial discs in cardiac tissue. '^'^^'^^ 

In keeping with this tendency for multiple cellular localiza- 
tions, the < 5 -fold enriched set includes several proteins now 
confirmed at the NE by others that have characterized functions 
in small molecule transport or signaling (Table S4;refs. 34—38). 
The exclusion limit of the nuclear pore complexes allows for small 
molecules and ions to exchange;" thus, the nucleus likely needs a 
variety of membrane transporters to maintain ion levels and pH. 
Notably, in muscle, where strong calcium fluxes during contrac- 
tion could result in leakage into the nucleus that might damage 
the genome, there was an abundance of calcium transporters and 
associated proteins identified in the NE data sets that could clear 
the ion from the nucleus (Table S3). The identification of sig- 
naling molecules identified in the NE by direct testing''"'''*'^"''" 
supports the wider findings from the GO-functional term analy- 
sis of tissue-specific accumulation of signaling molecules in the 
NE. The specific observations regarding some NE accumulation 
of Wnt and BMP signaling proteins in certain tissues (Fig. 6F) 
reinforce separate observations that Smads and (3-catenin inter- 
act with the NETs MANl and emerin''^''" and, perhaps more 
importantly, provide an explanation for how mutation of these 
ubiquitously expressed NETs can lead to tissue-specific disease 
pathologies. 

The identification of chromatin binding/modifying proteins 
in association with the NE is also supported by various indi- 
vidual observations in the literature (e.g., LAP2(3 with BAF and 
HDAC3,''''« hALPl with SUNl,"''' emerin with Imo?,"" MeCP2 
and HPl with LBR""'*'), though this study provides the first 
large-scale sampling of such proteins at the NE. Interestingly, the 
NuRD complex we find to be relatively enriched at the NE in cer- 
tain tissues (Fig. 6E) is involved in progeria defects caused by NE 
mutations.'" Thus known proteins in these data sets fit with the 
current literature for the NE influencing genome functions and 
these functions are tissue-specific. This indicates the likelihood 
that some of the 419 new NE proteins lacking GO-annotations 
(including both soluble proteins and NETs) also will contribute 
to genome functions. This likelihood is supported by findings 



that several NETs with unknown functions identified in the leu- 
kocyte data sets could alter genome organization.' More compel- 
ling, though, is the observation that even a protein with a known 
and distinct function identified in our data sets — the Na,K- 
ATPase (3m-subunit — has been confirmed at the NE and found 
to serve a secondary function as a co-regulator of transcription.^'"'^ 
NETs and their soluble partners that accumulate at the NE 
only in certain tissues could explain the focused tissue pathology 
in NE diseases. For example a muscle-specific NET that com- 
plexes with Lamin A or Emerin could be lost from the NE with 
mutations in these proteins linked to Emery-Dreifuss muscular 
dystrophy.'^''* Differences between tissues in NE signaling path- 
ways or chromatin organization/ gene regulation indicated here 
could result in particular tissues having greater susceptibility to 
disruption of specific functions with particular NE mutations. In 
keeping with this idea, the NuRD complex and signaling pro- 
teins indicated here to vary at the NE between tissues have been 
linked to various NE diseases and proteins.^^''"'"' The primary 
deficits in heritable diseases tend to localize in a particular tissue 
and moreover within a particular organelle. Thus the subcellular 
location and tissue distribution of proteins linked to disease are 
important in constructing a model for how their mutation can 
lead to pathology. This study suggests that to better understand 
such diseases all cell organelles should be analyzed for tissue spec- 
ificity in their proteomes. 

Materials and Methods 

Preparation and MudPIT analysis of NEs. Rat liver NEs and 
microsomes were prepared and analyzed by MudPIT as described 
in references 55 and 56. The human blood leukocyte and rat 
muscle NE preparation and data sets are described in detail in.'''" 
All protein pellets were solubilized in 0.1 M TRIS-HCl, pH 8.5, 
8 M urea, 5 mM TCEP. lodoacetamide was added to 10 mM 
for 30 min and endoproteinase Lys-C and trypsin digestion per- 
formed as above. Samples were centrifuged 30 min at 17,500 x g. 
Supernatants ("Ti" digests) were analyzed by MudPIT while 
pellets were resuspended in 0.1 M Na^COj pH 11.5, 8 M urea, 
5 mM TCEP for 30 min, then 10 mM lodoacetamide 30 min 
and then digested with proteinase K 4 h at 37°C'^ and also ana- 
lyzed by MudPIT ("PK" digests). 

Importantly in all cases at least five separate MudPIT runs"'^" 
were engaged for each preparation. During the course of a fully 
automated chromatography, 15 120 min cycles (Table SI) of 
increasing salt concentrations followed by organic gradients 
slowly released peptides directly into the mass spectrometer."" 
Three different elution buffers were used: 5% acetonitrile, 0.1% 
formic acid (Buffer A); 80% acetonitrile, 0.1% formic acid 
(Buffer B); and 0.5 M ammonium acetate, 5% acetonitrile, 0.1% 
formic acid (Buffer C). The last five (out of 15) chromatogra- 
phy steps consisted in a high salt wash with 100% Buffer C fol- 
lowed by the acetonitrile gradient. The distal application of a 2.5 
kV voltage electrosprayed the eluting peptides directly into ion 
trap mass spectrometers equipped with a nano-LC electrospray 
ionization source (ThermoFinnigan). Each full MS scan (from 
400 to 1,600 m/z) was followed by five (LTQ) MS/MS events 
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using data-dependent acquisition where the first most intense ion 
was isolated and fragmented by coUision-induced dissociation (at 
35% colhsion energy), followed by the second to 5th most intense 
ions. The raw data from each run is available at the Proteome 
Commons Tranche repository through the links given in Table 
SI. 

Data analysis. RAW files were extracted into ms2 file format'* 
using RAW_Xtract v.l.O.'"' MS/MS spectra — including the liver 
NEs and ER/microsomes data from" — were queried for peptide 
sequence information using SEQUEST™ v.27 (rev.9)''" against 
28,400 rat proteins (non-redundant NCBI sequences on July 10, 
2006), plus 197 human and mouse homologs of previously iden- 
tified NETs''" and 172 sequences from usual contaminants (e.g., 
human keratins, IgGs, proteolytic enzymes, etc.). In addition, 
sequences for proteins we had annotated as NETs using previ- 
ous database releases were added to these databases. Finally, to 
estimate false discovery rates, each non-redundant protein entry 
was randomized. The resulting "shuffled" sequences were added 
to the database and searched at the same time as the "forward" 
sequences, leading to a total search space of 57,538 and 60,828 
sequences for the rat and mouse databases (Table SI). MS/MS 
spectra were searched without specifying differential modifica- 
tions. To account for carboxamidomethylation by lAM, +57 Da 
were added statically to cysteine residues for all the searches. No 
enzyme specificity was imposed during searches, setting a mass 
tolerance of 3 amu for precursor ions and of ± 0.5 amu for frag- 
ment ions. 

Spectrum/peptide matches were selected using DTASelect''' 
and only retained if peptides were at least 7 amino acids long 
and their ends had to comply with the specificity of the pro- 
teolytic enzymes used, when appropriate. For trypsin-digested 
samples, peptides had to be fully tryptic, while for samples that 
had been chemically cleaved with CNBr prior to trypsin diges- 
tion (previously acquired mouse NE data set), Methionine or 
Lysine or Arginine had to be present before the N-terminus and 
at the C-terminus of the peptide sequences. In both cases, the 
DeltCn had to be at least 0.08, with a minimum XCorr of 1.8 for 
singly-, 2.0 for doubly- and 3.0 for triply-charged spectra and a 
maximum Sp rank of 10. For the proteinase K-digested samples, 
no specific peptide ends were imposed, but the DeltCn cut-off 
was increased to 0.15,'^ while XCorr minima were increased to 
2.5 for doubly- and 3.5 for triply-charged spectra. SEQUEST 
parameters for the spectrum to peptide matches for all detected 
proteins from rat liver NE and microsomal membranes are pro- 
vided in Tables S2A and S2B, respectively. Results from different 
runs were compared and merged using CONTRAST''' (Table 
S2C). Proteins that were subset of others were removed. NSAF7 
(Tim Wen) was used to create the final report (Table S2C) on all 
detected proteins across the different runs, calculate their respec- 
tive Normalized Spectral Abundance Factor (NSAF) values and 
estimate false discovery rates (FDR). 

Spectral FDR was calculated as: 



IxSIIUFFLED SpectralCounts ,„„ 
FDR= ^-^ x\00 



Protein level FDR was calculated as: 



ProteinFDR = 



SHUFFLED Proteins 
Total Proteins 



xlOO 



Under these criteria the final FDRs at the protein and peptide 
levels were 2.8 ±1.5% and 0.4 ±0.2%, respectively. 

To estimate relative protein levels, distributed normalized 
spectral abundance factors (dNSAFs) were calculated for each 
non-redundant protein or protein group, as described in:'* 



dNSAF. = 



dSAF. 



with 



dSAF. = 



^ ^ uSpC- „ ^ 



Total _ SpectralCounts 



in which shared spectral counts (sSpC) were distributed based 
on spectral counts unique to each protein i (uSpC) divided by the 
sum of all unique spectral counts for the M protein isoforms that 
shared peptide^' with protein i (Tables S2C and S3). 

Antibodies and western blotting. Antibodies used: GAPDH 
(Enogene, E1C604), Calreticulin (Cell Signaling, 2891S), 
Calnexin (Stressgen, SPA-860), lamin A (3262), NET antibod- 
ies were rabbit polyclonals generated to peptides from human 
sequences (MiUipore) LAP2(3 (06-1002), SUN2 (06-1038), 
TMTC3 (06-1009), TM7SF2 (06-1026), TMEM126A 
(06-1037), TMEM201 (06-1013), C17orf62 (06-1033), 
C17orf32 (06-1035), PPAPDC3 (06-1025), TMEM38A (06- 
1005), POPDC2 (06-1007), TMEM209 (06-1020), DHRS7 
(06-1027). 

Rat tissue lysates were prepared by grinding tissues under liq- 
uid nitrogen, adding sample buffer (100 mM Tris pH 6.8, 4 M 
Urea, 2% SDS, 50 mM DTT and 15% sucrose) and heating at 
65°C 10 min followed by sonibath sonication. Loading was nor- 
malized with GAPDH antibodies. Mitochondria were prepared 
by pelleting a liver post-nuclear supernatant at 11,000 x g 15 min 
and lysing in sample buffer. To increase lamina solubility, liver 
NE and microsomes were incubated on ice in 50 mM TRIS- 
HCl pH 7.4, 150 mM NaCl, 2 mM MgCl,, 0.2% NP-40 with 
protease inhibitors, then heated at 65°C for 2 min and sonicated 
in a 4°C sonibath. Protein concentrations were determined by 
Bradford assay before adding sample buffer. 

For Figures IE and 5 blots after quantification of protein lev- 
els in the lysates, equal amounts of protein were added for NEs 
and microsomes. For Figure IF, blots mitochondrial lysates were 
loaded so that porin levels matched those in total cell lysates and 
NE lysates were loaded so that the lamin levels matched those 
in total cell lysates. Blots shown in Figure 1 were run accord- 
ing to standard procedures visualizing bands with ECL reagent. 
For Figure 5, protein bands were visualized and quantified 
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with IRSOO-conjugated secondary antibodies using a LI-COR 
Odyssey and median background subtraction and averages from 
three independent blots are plotted. 

RT-PCR. All human total tissue RNAs for RT-PCR reactions 
were obtained from Stratagene except for peripheral blood leu- 
kocytes (PBL). In this case RNA was isolated using Trizol from 
cells prepared as for the blood leukocyte proteomics. Reactions 
were performed with 10 ng of the tissue RNAs using the Titan 
one tube RT-PCR system (Roche) according to manufacturer's 
instructions, except that dNTP concentration was increased 
to 500 |xM and MgCl^ to 3 mM. Typical reaction conditions 
were 30 min reverse transcription at 50°C, 2 min denaturation 
at 94°C, then 24 cycles of 94°C for 30 sec, 60°C for 30 sec and 
68°C for 45 sec. Peptidylprolyl isomerase A (PPIA) was used as 
a loading control and reactions were typically repeated at least 
three times when notable differences were observed. 

Immunofluorescence microscopy. For cryosections, fresh 
rat tissues cut into 2-3 mm cubes were embedded in Optimal 
Cutting Temperature Compound (Tissue-Tek) and snap-fro- 
zen in liquid nitrogen. Sections were cut on a Leica CM 1900 
Cryostat at 6—8 |JLm thickness and fixed in -20°C methanol. 
After rehydration, sections were incubated with NET antibod- 
ies O/N at 4°C followed by 2° antibodies as above. Images were 
recorded using an SP5 laser confocal system with 63 x oil 1.4 NA 
objective (Leica). Micrographs were saved as TIFF files and pre- 
pared for figures using Photoshop 8.0. 

Bioinformatics analysis. Proteins identified by SEQUEST 
were first mapped to an Ensembl gene. Human/Rat/Mouse 
orthologous groups were identified with Ensembl release 48''^ to 
remove redundancy and false variation that might have resulted 
from differences in human and rodent gene assignments. 
Orthologous group IDs were sorted according to run criteria 
(e.g., appearance in runs for different tissues, membrane helix 
status and 5 x higher dNSAF values in NEs vs microsomes) and 
compared using Venn diagrams to measure the level of tissue 
distinctness. Area proportional Venn diagrams were generated 
using Venn Diagram Plotter vl.4 from PNNL, US Department 
of Energy (http://omics.pnl.gov/software/VennDiagramPlotter. 
php). 

Basic properties of proteins listed in supplemental tables were 
calculated using BioPerl modules'''' or EMBOSS.® PSORTII was 
used for the prediction of nuclear localization signals.'''' For pre- 
diction of transmembrane spans it is important to note that a very 
stringent set of criteria was used for those annotated as "TM" 
in the supplemental tables and used for the numbers generated 
comparing NETs in various figures. Only proteins that had pre- 
dictions by TM-HMM version 2.0c''^ AND which had no signal 
peptide (SP) prediction if they had only one predicted membrane 
span were used for this analysis. However, some NETs confirmed 
at the NE (including one of the original pre-proteomics NETs) 
did not have membrane spans predicted using these criteria. The 
detailed listing of TM-HMM and SP predictions is given in 
Table S3. 



To plot heatmaps the log2 of the dNSAF scores for NETs 
within a particular tissue were z-transformed to standardize 
between experiments. 

Comparison of expression levels in different tissues was done 
by downloading microarray signal data from BioGPS at http:// 
biogps.gnf.org/^''*^ and calculating the fold-expression over the 
median value from a wide variety of mouse tissues tested in 
this transcriptome database. For Figure 3C just those proteins 
appearing in '60% or more of MudPIT runs were considered. 

Proteins were searched against the human protein reference 
database (HPRD, Johns Hopkins University) to identify protein 
complexes. These were compared between the individual NE 
data sets to determine how many complex components were iden- 
tified in each tissue and then restricted to those that had at least 
75% of complex components identified between all data sets. 

Biologically interesting gene ontology (GO) -terms and their 
corresponding child terms were retrieved from the mySQL data- 
base http://amigo.geneontology.org.^'' To ensure a fair compari- 
son for term enrichment, only human-mapped genes in our data 
set were considered. These were compared with the genomic 
data set of human Ensembl genes using BioMart (http://www. 
biomart.org/) as well as those GO-defined as having nuclear 
localization. Only experimentally verified GO-functional anno- 
tations were used including EXP (Inferred from Experiment), 
IDA (Inferred from Direct Assay) and IPI (Inferred from Physical 
Interaction). For a given GO-term, the fraction of genes contain- 
ing that term or any of the child terms was calculated for all data 
sets. The fold-difference was calculated by dividing this fractional 
value from our data set of interest by the value from the reference 
group. GO terms were also used to identify potential contami- 
nants as proteins with GO-targeting annotations for other organ- 
elles as follows. GO: 0016023, cytoplasmic membrane-bounded 
vesicle; GO:0005794, Golgi apparatus; GO:0005739, mito- 
chondrion; GO:0005773, vacuole; GO:0005768, endosome; 
GO:0005783, endoplasmic reticulum; GO:0042579, micro- 
body; GO: 0005856, cytoskeleton; GO: 0005694, chromosome; 
GO: 0005730, nucleolus; GO: 0005840, ribosome. 
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